On the combination of adaptive neuro-fuzzy inference system and deep residual network for improving detection rates on intrusion detection

Deep Residual Networks (ResNets) are prone to overfitting in problems with uncertainty, such as intrusion detection problems. To alleviate this problem, we proposed a method that combines the Adaptive Neuro-fuzzy Inference System (ANFIS) and the ResNet algorithm. This method can make use of the advantages of both the ANFIS and ResNet, and alleviate the overfitting problem of ResNet. Compared with the original ResNet algorithm, the proposed method provides overlapped intervals of continuous attributes and fuzzy rules to ResNet, improving the fuzziness of ResNet. To evaluate the performance of the proposed method, the proposed method is realized and evaluated on the benchmark NSL-KDD dataset. Also, the performance of the proposed method is compared with the original ResNet algorithm and other deep learning-based and ANFIS-based methods. The experimental results demonstrate that the proposed method is better than that of the original ResNet and other existing methods on various metrics, reaching a 98.88% detection rate and 1.11% false alarm rate on the KDDTrain+ dataset.


Introduction
Nowadays, leading technologies increase cyber risks for users and businesses. And, according to the Cisco Annual Internet Report (2018-2023) White Paper [1], the threat of network intrusions is growing year by year. There was a 776% growth in attacks between 100 Gbps and 400 Gbps from 2018 to 2019. Over half of the operators experienced infrastructure outages. The advance in technologies, such as e-commerce, mobile payments, cloud computing, Big Data and analytics, IoT, AI, machine learning, and social media, is the main driver of economic growth but has also led to a higher incidence of cyberattacks [1]. As one of the key technologies for ensuring network security, intrusion detection plays a more and more important role.
• First, we propose a new and deep architecture for intrusion detection problems. This architecture enables continuous attributes to keep uncertainty property in the deep training processes.
• Second, to identify deeper patterns in the training data, we use ResNet to train the model with the fuzzy rules generated by ANFIS and concrete attributes. Meanwhile, to optimize the model, we provide a mechanism that connects the ANFIS and ResNet to co-train the two algorithms with the losses.
• Third, to evaluate the performance of the proposed method, we realize the proposed method with Python 3.7, PyTorch, and sklearn on the NSL-KDD dataset. The results of the experiment show improved performance compared with ResNet and other methods. This means the alleviation of ResNet's overfitting problem on problems with uncertainty.
The remainder of this paper is organized as follows. In Section II, we review the related research in the field of intrusion detection, especially how ANFIS-based and Deep learningbased methods facilitate the development of intrusion detection. Section III gives a brief introduction to Pearson correlation analysis, K-means, ANFIS, and ResNet. In Section IV, a description of the proposed method is introduced. Then, Section V highlights the proposed method with a discussion of the experimental results and a comparison with a few previous studies on the NSL-KDD dataset. Finally, the conclusions are discussed in Section VI.

Related work
Deep learning has demonstrated its effectiveness in dimensionality reduction and classifying missions. It was proposed by Hinton [7] based on the deep belief network (DBN) in 2006. And, the first real multilayer structure learning algorithm, the convolution neural network (CNN), is proposed by LeCun et al. in [8], which utilizes spatial relative relations to reduce the number of parameters and improve training performance. Then, many different architectures of deep learning such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) have been proposed and achieved success in various fields, such as image and video recognition, audio processing, natural language processing, autonomous systems, and robotics [9].
In the field of intrusion detection, deep learning also has received special attention in recent years. The following describes recent research in intrusion detection and categorizes them according to the different basic architectures.
DBN has proved to be the most influential deep and generative neural network that learns one layer of features from unlabeled data. It has been widely studied in intrusion detection such as [10][11][12][13]. For example, Y. Zhang, Li, & Wang [14] proposed an improved Genetic Algorithm (GA) and DBN. This algorithm uses GA to optimize the number of hidden layers and the number of neurons in DBN. Therefore, the architecture of DBN is optimized and achieved a higher detection rate.
Numerous ANN-based approaches such as [2,[15][16][17][18][19] are applied in IDSs. For instance, in [20], G. Wang, Hao, Mab, & Huang presented a fuzzy clustering enhanced ANN, FC-ANN, which uses fuzzy clustering to generate different training subsets of ANNs. Then, a fuzzy aggregation module is finally implemented to aggregate ANNs' results. In their IDS approach, they have applied ANN, fuzzy clustering, and Fuzzy aggregation to recognize intrusions accurately. Likewise, in another ANN-based IDS scheme, Ashfaq, etc. In [21], the authors presented a fuzzy semi-supervised ANN, which is able to be trained through the combination of labeled and unlabeled samples. The divide and conquer strategy (data categorized into different groups according to the magnitude of fuzziness) is used for categorizing the unlabeled samples and incorporating each category with the original training set. The ANN is implemented to output a fuzzy membership vector and retrain the conquering category. The authors claimed that the unlabeled sample belonging to fuzziness groups makes major contributions to improving the ANN's performance compared with the existing classifier. In [22], the authors applied ANN with a multiverse optimizer (MVO, a new natural evolutionary algorithm). They used MVO to train an ANN to identify new attacks. Furthermore, they conducted the experiment on both NSL-KDD and the new benchmark dataset UNSW-NB15. The authors demonstrated that their solution has better performance than PSO-ANN, et al.
Various architectures of RNN were implemented in IDSs because of RNN's ability to extract the temporal features from the input data, such as [19,[23][24][25]. For example, the IDS scheme introduced in [12], proposed a two-level LSTM model for self-adaptive detection of 5G mobile networks. First, the first level of the approach is a supervised or semi-supervised learning approach that enables DBN or Stacked AutoEncoders (SAE) to run as fast as possible on each Radio Access Network (RAN). Then, all the collected symptoms from the first layer are sent to the Network Anomaly Detection (NAD) component, where they are assembled and used as input for an LSTM Recurrent Network. The authors have applied a well-known botnet dataset CTU for training and validating. They also showed that the architecture can self-adapt the anomaly detection system based on the volume of network flows and optimize resource consumption.
The IDS schemes presented in [26] and [25], proposed CNN-based IDSs. In [26], the authors take traffic data as images to train on CNN. In [25], the authors use CNN to extract meaningful features from IDS big data. And they use weight-dropped LSTM to retain longterm dependencies among extracted features to prevent overfitting problems.
Less ResNet-based IDSs are designed to deal with intrusions. The ResNet-based scheme presented in [27], converts the network traffic data into image form and trained a ResNet over the converted data. However, this performance is only in the case of binary classification. In [6], Yuelei Xiao and Xing Xiao presented a simplified residual network (S-ResNet) and tested it on the NSL-KDD dataset. They used metrics such as accuracy and F1-score for performance analysis. This scheme's most important attributes are its capability to prevent ResNet's overfitting problem for low-dimensional and small-scale datasets by simplifying the network structure.
Fuzzy logic is an effective method to provide fuzziness in attributes and reduce the overfitting problem of algorithms. Norbert Wiener, the founder of cybernetics, once pointed out that man's superiority over the most perfect machine is that man is capable of using fuzzy concepts [28]. This shows that there is an essential difference between the human brain and the computer. Thus, if we want to simulate the human brain, fuzzy logic is essential. Fuzzy schemes have successfully proved their ability to detect intrusions and malicious behaviors in the presence of uncertain data [29]. Therefore, a large number of fuzzy approaches have been successfully applied in IDSs. And, Mohammad Masdari and Hemn Khezri [29] categorized various fuzzy intrusion detection schemes into nine categories in 2020, as shown in Fig 3. Meanwhile, they illustrated that the ANFIS classifier is one mostly used classifiers in various misuse detection schemes studies.
ANFIS algorithm is an algorithm that combines the uncertainty processing ability of fuzzy logic with the learning process of the ANNs. ANFIS was first used in IDSs in 2007. Toosi & Kahani [30] used five ANFIS modules to explore intrusive activity, reaching a 95.3% detection rate on the KDDCUP99 dataset. Then, Chan et al. [31] presented a policy-enhanced fuzzy model with ANFIS characteristics. Devi et al. [32] introduced an IDS scheme using ANFIS to detect security attacks on 5G wireless networks.
In the combination of ANFIS and other algorithms, D. Karaboga and E. Kaya [33] proposed a hybrid artificial bee colony (ABC) algorithm to train ANFIS. This algorithm uses arithmetic crossover to quickly converge r and has better efficiency than the standard ABC algorithm. Altyeb Altaher presented an IDS scheme EHNFC in [34], which is an evolutionary neurofuzzy classifier for malware classification. It can use fuzzy rules to detect fuzzy malware and improve its detection accuracy by learning new fuzzy rules to evolve its structure. In addition, it uses an improved fuzzy rule updating clustering method to update the centroid and radius of the clustering permission feature. These changes to the application of clustering methods improve the convergence of clustering and create rules that adapt to the input data, thereby improving the accuracy. The scheme introduced in [35] adopts data mining methods such as neural fuzzy and radial basis support vector machine to achieve a high detection rate when dealing with security attacks. The method is mainly divided into four stages, and k-means clustering is used to generate parameter-tuning subsets. Based on these subsets, various neural fuzzy models are trained to form classification vectors of support vector machines.
For solving the problem of the large volume of data resulting in the network getting expanded with false alarm rate of intrusion and detection accuracy decreased, Manimurugan, Majdi, Mohmmed, Narmatha, & Varatharajan [36] presented an algorithm CSO-ANFIS. This algorithm uses the Crow Search Optimization algorithm to optimize ANFIS and reaches a 95.8% detection rate on the NSL-KDD dataset.
ANFIS is widely used and studied in intrusion detection systems and misuse detection systems. However, ANFIS only has five layers in the original algorithm, which will reduce the deeper feature extraction ability of the algorithm. Therefore, the combination of ANFIS and deeper networks for deeper feature extraction lacks studies.
Thus, nowadays, deep learning has become more and more attractive in the intrusion detection field, and many deep learning-based models have been applied to intrusion detection. Meanwhile, in various deep learning-based models, ResNet has attracted more and more attention due to its deeper feature extraction ability. However, fewer ResNet-based methods are applied in the IDSs because of the limitations mentioned above. To alleviate the overfitting problem of ResNet on intrusion detection problems and apply ResNet effectively in intrusion detection problems, fuzzy logic is a potential solution.

Pearson correlation analysis
The Pearson correlation coefficient r is usually used to measure whether there is a linear relationship between two objects, as shown below: ðx À xÞðy À yÞ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi S n 1 ðx i À xÞ 2 q ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi S n 1 ðy i À yÞ The r describes the degree of linear correlation between x and y. Usually, r > 0 indicates x and y are positively correlated. r < 0 indicates x and y are negatively correlated. The larger the absolute value of r, the stronger the correlation. In normal circumstances, the correlation strength of the attribute is judged by the following value range: 0.8-1.0 very strong correlation, 0.6-0.8 strong correlation, 0.4-0.6 moderate correlation, 0.2-0.4 weak correlation, 0.0-0.2 very weak correlation or no correlation.

K-means algorithm
K-means is one of the most popular clustering algorithms because it is very flexible, simple, intuitive, easy to implement, and fast in execution [37]. The main steps of the K-means clustering algorithm are as follows: Stage 1 According to the data ranges of n data objects, k cluster centers are randomly initialized.

Stage 2
Assign each object to the group closest to the center.

Stage 3
The location of each center is updated by calculating the average of the objects assigned to it.
Stage 4 Stage 2 and Stage 3 are repeated until the maximum number of iterations is reached, or until the cluster, center is no longer moving.

Adaptive Neuro-Fuzzy Inference System-ANFIS
ANFIS algorithm [38] employs a fuzzy inference module based on empirical knowledge to make the final decision. The basic structure of most fuzzy inference systems (FISs) is a model that maps the input characteristics to the input membership functions (MF). ANFIS is a multilayer feedforward network consisting of five layers. Each layer in the ANFIS architecture contains nodes defined by node functions, as shown in Fig 4. The five layers of ANIFIS are: Layer1: Calculate the membership grade of the inputs with the membership function (MF). In this study, we used Gaussian MF.
Where x 1n is the input to the node A1 n . n is the number of attributes. j is the linguistic label (small, large, etc.) of attribute n. A 1nj is the membership that determines the degree to which input x 1n satisfies A 1nj . mean n represents the center of the Gaussian function. sigma n represents the width of the Gaussian function. mean n and sigma n are the learning parameters. They are referred to as premise parameters. As the values of these parameters change, the Gaussian functions vary accordingly.
Layer2: Firing strengths. Each node in this layer calculates the firing strength of the i-th rule.
Where i is the node number of Layer2. Each node output represents the firing strength of a rule.
Layer3: Calculate the normalized firing strength of the i-th rule.
Where a is the number of nodes in Layer2.
Layer4: Adaptive node with a linear function. Each node calculates the weighted value of the consequent part of each rule.
Where p i , q i , r i are the learning parameters. These parameters are referred to as consequent parameters.
Layer5: Produce the overall output by aggregating all the fired rule values.
ANFIS uses a hybrid learning mechanism to train the model. The main learning parameters of ANFIS are premise parameters and consequent parameters. In the forward-passing process of hybrid learning, the node output is propagated to the fourth layer, and the leastsquares method is used to estimate the consequent parameters. In backward propagation, the loss (the difference between the expected output and the actual output) is propagated back to the first layer, with the premise parameters updated using gradient descent, while the consequent parameters are fixed.

Deep residual Network-ResNet
ResNet is one of CNN's variants. It is a large-scale convolutional neural network constructed by residual blocks, 20 times larger than AlexNet [39] and 8 times larger than VGG-16 [40]. It excels in image recognition and is the winner of the image classification and object recognition algorithms in the 2015 Image Net Large Scale Visual Recognition Competition. It also outperforms the third version of GoogLeNet [41]. ResNet comes from an artificial intelligence team of Microsoft [42]. Because of the residual mechanism, the depth of the network can be deeper than that of the traditional networks, which can effectively avoid the problems of gradient disappearance and training difficulties of the deep network. The structure of the residual block is shown in Fig 5. In Fig 5, x is the input of a residual block; f(x) is the output of the residual block before the second activation function. That is to say, f(x) = W 2 σ(W 1 x), where W 1 and W 2 are the weights of the first and second layers, and σ is the rectified linear unit (ReLU) activation function. The output of the residual block is σ(f(x) + x).
In the original paper proposed the ResNet, five different architectures are presented, including 18-layer, 34-layer, 50-layer, 101-layer, and 152-layer ResNet. The more layers, the more

PLOS ONE
Adaptive neuro-fuzzy inference system and deep residual network for intrusion detection time it takes to train and predict. Fig 6 shows the network structure of ResNet18. There are 8 basic block modules in ResNet18. The arrow in the basic block is the shortcut connection. These shortcut connections allow the neural network to be deeper because gradients can transfer farther in the backpropagation process.

Proposed methodology
In this section, we first demonstrate the preprocessing techniques in the proposed method, including the attributes selection method and the techniques of initializing the overlapped intervals of continuous attributes. Then, the proposed method is explained detailed.

Preprocessing
First, we conducted correlation analysis and attribute selection for the data. Since the Layer2 of ANFIS uses the multiplication of all incoming signals to form rules, a large number of rules are generated, as shown in Eq (3). Therefore, it is necessary to conduct preliminary attribute selection for problems with many attributes. The purpose of correlation analysis is to select the continuous attributes, which will be the input of the proposed method's ANFIS-part.
In this study, we used Pearson correlation analysis to select continuous attributes, which will be the input of the proposed method's ANFIS-part. And we use 0.1 as the minimum threshold for selecting attributes. An important advantage of using Pearson correlation analysis to select continuous attributes is that the selected continuous attributes have a higher degree of linear correlation with the target.
Second, the standard deviation is used to determine the original interval number of continuous attributes selected by Pearson correlation analysis. Standard deviation reflects the degree of dispersion of an attribute. The larger the standard deviation is, the more intervals are needed to represent the dispersion. Therefore, in this study, we related the original number of attribute intervals to the exponent of the standard deviation. We used the exponent of the standard deviation of each selected continuous attribute plus 2 as the original interval number.
Third, to further minimize and check the number of intervals, we proposed an adaptive K-means to dynamically determine the number of intervals. This is because although K-means is one of the most popular clustering algorithms [37], it also has some disadvantages, one of them is the number of clusters k of K-means must be known in advance. To dynamically determine the minimum number of clusters k. An adaptive K-means algorithm is proposed (see Fig 7).
The adaptive K-means algorithm is based on the following four stages: Stage 1 Min_max_scaler normalization is used to normalize all attributes. This is to eliminate the dimensional influence among attributes.
Stage 2 K-means is used to calculate the cluster centers of each interval. The original interval number of K-means is calculated from the standard deviation, which is the exponent of the standard deviation of each attribute plus 2.

Stage 3
Divide the data into intervals and calculate the sigma and mean of each interval as below. If one interval's sigma is less than 0.03 which means insufficient dispersion of data on this interval, the clustering number will be reduced by 1, and the clustering is returned to Stage2 for re-clustering.
sigma ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X n Where n is the total amount of data.

Stage 4
If the sigma of each attribute is greater than 0.03, the attribute is finally distributed according to the clustering intervals. And the calculated sigma and mean of intervals will be the original sigma and mean before being trained by the proposed method, which will make the sigma and mean of the ANFIS at an appropriate level during the initial phase.

Connection of ANFIS and ResNet
In the proposed method, ANFIS is used to generate the overlapped continuous attributes' intervals and fuzzy rules (fuzzy combinations of selected continuous attributes), and ResNet is used to identify the deep features in the generated fuzzy rules and discrete attributes. For this purpose, we connect the modified ANFIS algorithm to the structure of ResNet for deep pattern extraction and co-train the two algorithms, as shown in Fig 8. The layers in the proposed method are demonstrated below: Layer1: According to the means and sigmas calculated by adaptive K-means to form the Gaussian membership functions (MFs). Then, these MFs are used to calculate the membership grade of the inputs.
Where mean and sigma are the learning parameters.
Layer2: Firing strengths. Each node in this layer calculates the firing strength of the i-th rule.
Layer3: Calculate the normalized firing strength of the i-th rule.

w i f con sel i ¼ w i p con sel i x con sel i ð12Þ
Where p con_sel_i is the learning parameter.
Layer5: Concatenate the weighted rules, discrete date, and the other continuous data as inputs of ResNet18.
Layer6: Calculate the result Y 0 by ResNet18. Then, the algorithm calculates and feedback the gradient to co-train the learning parameters, including the normal learning parameters of ResNet and the mean, sigma and p con_sel_i parameters of ANFIS, as shown in Eqs (9) and (12). The actual algorithm is shown in Algorithm 1.

Algorithm 1 The proposed algorithm
Input: X con sel ; X dis ; X con other ð Þ 2 R m�d ; X con sel 2 R m�d 1 is selected continuous data, Step 1. ANFIS(X con_sel ) ! (WF): • The original Gaussian membership functions (formed by means and sigmas after Adaptive K-means).
• w i , firing strengths of i-th rule, as follows: • w i , firing strengths of i-th rule, as follows: Where a is the number of rules.
Step 2. ResNet(WF, X dis , X con_other )!(Y 0 ): • Concatenate WF, X dis , and X con_other as: Where • Calculate ResNet18(X 0 ) to get Y 0 . The structure of ResNet18 is shown in Fig 6. Step 3. Backpropagation(Y 0 , Y): • Adam optimizer [43] to transfer the gradient to Layer5. Then the gradient divides into two parts: The Q 1 and Q 2 are trained by Adam (learning rate = 0.001) the W is trained by SGD (learning rate = 1e-4, momentum = 0.99) • Pass the gradient to Layer1. Use the hybrid train to train the sigmas and means.
• Early Stopping mechanism (patience = 5). If the loss keeps no improvement after 5 epochs, stop training.
In the training process, the intervals of the selected continuous attribute x 1i are divided according to the standard deviation and adaptive K-means, which will generate overlapped intervals. Then, the membership function is used to turn the value of these continuous attributes into the membership degree belonging to the intervals. By not just providing the exact continuous value, but the membership degree of overlapped intervals (e.g., big, medium, small), these membership degrees will provide certain fuzziness property for the proposed algorithm. Then, fuzzy rules (product of different interval membership degrees of attributes) are generated to represent different possible combinations of continuous selected attributes (such as duration is small, the count is medium, diff_srv_rate is big). After the normalization of all the W i (rules' firing strengths), weighted fuzzy rules are formed, which represent the rules' contribution to the target. Finally, the weighted fuzzy rules, discrete data, and the remaining continuous attributes will be concatenated to input ResNet. Compared with the original ResNet, the ANFIS-enhanced ResNet will transfer the exact continuous value to fuzzy and overlapped intervals' membership degree, generate fuzzy rules (various fuzzy combinations of selected continuous attributes), and provide more fuzzy characteristics for ResNet.

Results and discussion
The simulation was carried out using Python 3.7, PyTorch, and sklearn on the NSL-KDD dataset. And the performance analysis and comparison with other studies are conducted in this section.

Data description
NSL-KDD (National security lab-knowledge discovery and data mining) is a traditional benchmark dataset in intrusion detection. It is the enhanced form of the KDD99 dataset to outperform its limitations. It is a public dataset, which can be downloaded from https:// www.unb.ca/cic/datasets/nsl.html. The NSL-KDD dataset has total of 43 attributes, including 41 normal attributes, 1 target attribute, and 1 hard attribute (hard attribute is used to describe the hard degree of classification). Of the 41 attributes, 7 are discrete and 34 are continuous.
In the NSL-KDD dataset, the most commonly used subset of NSL-KDD is the KDDTrain + subset and the KDDTest+ subset. The NSL-KDD dataset can test the generalization ability of an algorithm because the traffic distribution of the KDDTrain+ subset and KDDTest+ subset is different. Some new intrusion types exist only in the KDDTest+ subset. The KDDTrain + subset of the NSL-KDD dataset contains 23 classes, including 22 types of attacks and normal. The KDDTest+ subset of the NSL-KDD dataset contains 38 classes, including 37 types of attacks and normal. The actual traffic distribution of KDDTrain+ and KDDTest+ is shown in Table 1.

Performance metrics
In intrusion detection, samples containing attacks can be defined as positive samples, and samples without attacks can be defined as negative samples. The results of the intrusion detection can be divided into the following four situations: TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative), as shown in Table 2. The proposed method will be evaluated by precision, recall rate, F1-Score, ACC (accuracy), DR (detection rate), and FAR (false alarm rate). These metrics can be calculated by TP, TN, FP, and FN.
Precision: The percentage of all TP samples to all positive classifications (TP+FP).
Recall rate: The percentage of all TP samples to all samples (TP+FN) that should be positive.
Accuracy (ACC): The percentage of all correct predictions.
F1-Score: Harmonic mean of precision and recall rate. A high F1-Score indicates high precision and recall. When the precision and recall are both 100%, the best F1-Score is 1. The worst F1-Score is 0.
DR (Detection Rate): The percentage of successfully categorizing this data in this category.
FAR (False Alarm Rate): The percentage of data that should not be identified for this category.

Experimental results
We evaluated the proposed method with the NSL-KDD dataset. 80% KDDTrain+ subset was used as the training dataset. The full KDDTrain+ subset and KDDTest+ subset served as the validation and test set, respectively. S2 Table shows the results of Pearson correlation analysis, which is the Pearson correlation degree of 41 normal attributes towards the target attribute. In order to select the continuous attributes which are more related to the target, we take the correlation degree's absolute value greater than 0.1 as the threshold and select 18 continuous attributes, as shown in Table 3, which reduces the dimension of data in the ANFIS-part and the number of generated rules. And the total Pearson correlation matrix figure of the NSL-KDD dataset's 43 attributes is shown in S1 Fig. In order to minimize the number of selected continuous attributes' intervals and initialize the sigma and mean of each interval, first, through the standard deviation analysis, the original number of selected continuous attributes' intervals is shown in Table 3. And, after the proposed adaptive K-means algorithm, the final interval number, sigmas and means of each selected continuous attribute are shown in Table 4. In fact, the number of fuzzy rules generated by the ANFIS-part is reduced to 192, as shown in the S1 Table. Then, through Algorithm 1, the ANFIS-part of the proposed method generated fuzzy rules (fuzzy combinations of selected continuous attributes), and the ResNet-part of the proposed method identified the deep pattern in the fuzzy rules and discrete data. The whole architecture is co-trained with the loss (the difference between the actual Y and the predicted Y 0 ). After this process, the post-training ANFIS-part rules with four main continuous attributes (dst_host_count, dst_host_srv_count, count, duration) are shown in the S1 Table. The training, validation, and testing error changes in this process are shown in Fig 9. We can see that in the initial training phase, the model presents an overfitting problem. However, after 8 epochs, the overfitting problem begins to ease and keeps a good performance. The performance of the proposed method on different metrics is shown in Tables 5 and 6. According to the experimental result on the KDDTrain+ dataset, as shown in Table 5, the method presents a high detection rate and low false alarm rate. The DR on the KDDTrain + dataset reaches 98.88%. And the FAR on KDDTrain+ dataset is only 1.11%. On the KDDTest+ dataset (some intrusions only exist in the KDDTest+ dataset), the proposed method performs a certain generalization ability. The DR on the Normal category reaches 96.67%. The Precision on the DOS category reaches 93.14%. And, overall, the method reaches a total of 75.9% DR, 79.42% Precision, and 75.91% ACC on the KDDTest+ dataset. However, the FAR on the KDDTest+ dataset is not ideal, accounting for 24.15%, which needs to be further improved.
The performance of the proposed method was also compared with other studies using the same dataset, such as MVO-ANN [22], FC-ANN [36], DNN [44], CSO-ANFIS [36], and  original ResNet. In terms of the classification performance of five main categories, the comparison of the performance with these classifiers is shown in Tables 7 and 8. According to the comparison, the detection rate of the proposed method is 55.46% better than that of the original ResNet. The false alarm rate is 55.47% less than ResNet. This means that the proposed method in this paper alleviates the overfitting problem of ResNet to some extent. Also, the performance on various metrics is improved and is better than many other methods. Compared with other methods, the DR on the KDDTrain+ dataset is the best, 2.63% higher than MVO-ANN [22]. The FAR is only worse than MVO-ANN [22]. The F1-score is 1.25% higher than DBN+ResNet101 [45]. The recall is 0.13% higher than DBN+ResNet101 [45]. The ACC is only 0.07% lower than DBN+ResNet101 [45]. The Precision is only 0.06% lower than DBN+ResNet101 [45]. And, except for U2R and U2R, in the other three main categories, the ACC is all superior to CSO-ANFIS, BPNN, GA-DBN, etc.

Conclusion
There are two important experiments conducted in the study. The first one is the ANFISenhanced ResNet structure. In this experiment, the concatenation of modified ANFIS and ResNet generates a fuzzy-enhanced architecture to improve the fuzzy ability of ResNet. The second one is the efficiency improvement of the proposed algorithm. In the experiment, we used Pearson correlation analysis to select 18 continuous attributes that are highly related to the target as the inputs of the proposed method's ANFIS-part. And, for determining the meaningful and minimum interval division of ANFIS-part, standard deviation and proposed adaptive K-means algorithm are used to dynamically determine the original interval division of each selected continuous attribute.
The performance of the proposed method was compared with other algorithms: BPNN, GA-DBN, DNN, FC-ANN, GA-ANFIS, PSO-ANFIS, CSO-ANFIS, and original ResNet. The proposed method achieved a 98.88% detection rate and 1.11% false alarm rate on the KDDTrain+ dataset, which is better than these methods on various metrics. Meanwhile, compared with the original ResNet, the performance on various metrics has been improved, which presents an enhanced generalization ability.
Supporting information S1 Fig. Correlation matrix between attributes of NSL-KDD dataset.